利用机器学习来促进优化过程是一个新兴领域,该领域有望绕过经典迭代求解器在需要接近实时优化的关键应用中引起的基本计算瓶颈。现有的大多数方法都集中在学习数据驱动的优化器上,这些优化器可在解决优化方面更少迭代。在本文中,我们采用了不同的方法,并建议将迭代求解器完全替换为可训练的参数集功能,该功能在单个feed向前输出优化问题的最佳参数/参数。我们将我们的方法表示为学习优化优化过程(循环)。我们显示了学习此类参数功能的可行性,以解决各种经典优化问题,包括线性/非线性回归,主成分分析,基于运输的核心和二次编程在供应管理应用程序中。此外,我们提出了两种学习此类参数函数的替代方法,在循环中有和没有求解器。最后,通过各种数值实验,我们表明训练有素的求解器的数量级可能比经典的迭代求解器快,同时提供了接近最佳的解决方案。
translated by 谷歌翻译
这项研究的目的是开发一个强大的基于深度学习的框架,以区分Covid-19,社区获得的肺炎(CAP)和基于使用各种方案和放射剂量在不同成像中心获得的胸部CT扫描的正常病例和正常情况。我们表明,虽然我们的建议模型是在使用特定扫描协议仅从一个成像中心获取的相对较小的数据集上训练的,但该模型在使用不同技术参数的多个扫描仪获得的异质测试集上表现良好。我们还表明,可以通过无监督的方法来更新模型,以应对火车和测试集之间的数据移动,并在从其他中心接收新的外部数据集时增强模型的鲁棒性。我们采用了合奏体系结构来汇总该模型的多个版本的预测。为了初始培训和开发目的,使用了171 Covid-19、60 CAP和76个正常情况的内部数据集,其中包含使用恒定的标准辐射剂量扫描方案从一个成像中心获得的体积CT扫描。为了评估模型,我们回顾了四个不同的测试集,以研究数据特征对模型性能的转移的影响。在测试用例中,有与火车组相似的CT扫描,以及嘈杂的低剂量和超低剂量CT扫描。此外,从患有心血管疾病或手术病史的患者中获得了一些测试CT扫描。这项研究中使用的整个测试数据集包含51 covid-19、28 CAP和51例正常情况。实验结果表明,我们提出的框架在所有测试集上的表现良好,达到96.15%的总准确度(95%CI:[91.25-98.74]),COVID-119,COVID-96.08%(95%CI:[86.54-99.5],95%),[86.54-99.5],),,),敏感性。帽敏感性为92.86%(95%CI:[76.50-99.19])。
translated by 谷歌翻译
逆转录 - 聚合酶链反应(RT-PCR)目前是Covid-19诊断中的金标准。然而,它可以花几天来提供诊断,假负率相对较高。成像,特别是胸部计算断层扫描(CT),可以有助于诊断和评估这种疾病。然而,表明标准剂量CT扫描对患者提供了显着的辐射负担,尤其是需要多次扫描的患者。在这项研究中,我们考虑低剂量和超低剂量(LDCT和ULDCT)扫描方案,其减少靠近单个X射线的辐射曝光,同时保持可接受的分辨率以进行诊断目的。由于胸部放射学专业知识可能不会在大流行期间广泛使用,我们使用LDCT / ULDCT扫描的收集的数据集进行人工智能(AI)基础的框架,以研究AI模型可以提供人为级性能的假设。 AI模型使用了两个阶段胶囊网络架构,可以快速对Covid-19,社区获得的肺炎(帽)和正常情况进行分类,使用LDCT / ULDCT扫描。 AI模型实现Covid-19敏感性为89.5%+ - 0.11,帽敏感性为95%+ \ - 0.11,正常情况敏感性(特异性)85.7%+ - 0.16,精度为90%+ \ - 0.06。通过纳入临床数据(人口统计和症状),性能进一步改善了Covid-19敏感性为94.3%+ \ - PM 0.05,帽敏感性为96.7%+ \ - 0.07,正常情况敏感性(特异性)91%+ - 0.09,精度为94.1%+ \ - 0.03。所提出的AI模型基于降低辐射暴露的LDCT / ULDCT扫描来实现人级诊断。我们认为,所提出的AI模型有可能协助放射科医师准确,并迅速诊断Covid-19感染,并帮助控制大流行期间的传输链。
translated by 谷歌翻译
唇读是从唇部运动识别语音的操作。这是一项艰巨的任务,因为在发音时嘴唇的动作是类似的。在对话期间,景观用于描述唇部运动。本文旨在展示如何通过将视频到字符分为两个阶段,即将视频转换为Viseme,然后使用单独的型号将Viseme转换为角色来使用外部文本数据(用于对角色映射)。与正常序列相比,我们所提出的方法通过4 \%改善了4 \%的序列次序列在BBC-oxford唇读数2(LRS2)数据集上序列唇读模型。
translated by 谷歌翻译
Real-world robotic grasping can be done robustly if a complete 3D Point Cloud Data (PCD) of an object is available. However, in practice, PCDs are often incomplete when objects are viewed from few and sparse viewpoints before the grasping action, leading to the generation of wrong or inaccurate grasp poses. We propose a novel grasping strategy, named 3DSGrasp, that predicts the missing geometry from the partial PCD to produce reliable grasp poses. Our proposed PCD completion network is a Transformer-based encoder-decoder network with an Offset-Attention layer. Our network is inherently invariant to the object pose and point's permutation, which generates PCDs that are geometrically consistent and completed properly. Experiments on a wide range of partial PCD show that 3DSGrasp outperforms the best state-of-the-art method on PCD completion tasks and largely improves the grasping success rate in real-world scenarios. The code and dataset will be made available upon acceptance.
translated by 谷歌翻译
Research on automated essay scoring has become increasing important because it serves as a method for evaluating students' written-responses at scale. Scalable methods for scoring written responses are needed as students migrate to online learning environments resulting in the need to evaluate large numbers of written-response assessments. The purpose of this study is to describe and evaluate three active learning methods than can be used to minimize the number of essays that must be scored by human raters while still providing the data needed to train a modern automated essay scoring system. The three active learning methods are the uncertainty-based, the topological-based, and the hybrid method. These three methods were used to select essays included as part of the Automated Student Assessment Prize competition that were then classified using a scoring model that was training with the bidirectional encoder representations from transformer language model. All three active learning methods produced strong results, with the topological-based method producing the most efficient classification. Growth rate accuracy was also evaluated. The active learning methods produced different levels of efficiency under different sample size allocations but, overall, all three methods were highly efficient and produced classifications that were similar to one another.
translated by 谷歌翻译
Classification using supervised learning requires annotating a large amount of classes-balanced data for model training and testing. This has practically limited the scope of applications with supervised learning, in particular deep learning. To address the issues associated with limited and imbalanced data, this paper introduces a sample-efficient co-supervised learning paradigm (SEC-CGAN), in which a conditional generative adversarial network (CGAN) is trained alongside the classifier and supplements semantics-conditioned, confidence-aware synthesized examples to the annotated data during the training process. In this setting, the CGAN not only serves as a co-supervisor but also provides complementary quality examples to aid the classifier training in an end-to-end fashion. Experiments demonstrate that the proposed SEC-CGAN outperforms the external classifier GAN (EC-GAN) and a baseline ResNet-18 classifier. For the comparison, all classifiers in above methods adopt the ResNet-18 architecture as the backbone. Particularly, for the Street View House Numbers dataset, using the 5% of training data, a test accuracy of 90.26% is achieved by SEC-CGAN as opposed to 88.59% by EC-GAN and 87.17% by the baseline classifier; for the highway image dataset, using the 10% of training data, a test accuracy of 98.27% is achieved by SEC-CGAN, compared to 97.84% by EC-GAN and 95.52% by the baseline classifier.
translated by 谷歌翻译
Based on WHO statistics, many individuals are suffering from visual problems, and their number is increasing yearly. One of the most critical needs they have is the ability to navigate safely, which is why researchers are trying to create and improve various navigation systems. This paper provides a navigation concept based on the visual slam and Yolo concepts using monocular cameras. Using the ORB-SLAM algorithm, our concept creates a map from a predefined route that a blind person most uses. Since visually impaired people are curious about their environment and, of course, to guide them properly, obstacle detection has been added to the system. As mentioned earlier, safe navigation is vital for visually impaired people, so our concept has a path-following part. This part consists of three steps: obstacle distance estimation, path deviation detection, and next-step prediction, done by monocular cameras.
translated by 谷歌翻译
National Association of Securities Dealers Automated Quotations(NASDAQ) is an American stock exchange based. It is one of the most valuable stock economic indices in the world and is located in New York City \cite{pagano2008quality}. The volatility of the stock market and the influence of economic indicators such as crude oil, gold, and the dollar in the stock market, and NASDAQ shares are also affected and have a volatile and chaotic nature \cite{firouzjaee2022lstm}.In this article, we have examined the effect of oil, dollar, gold, and the volatility of the stock market in the economic market, and then we have also examined the effect of these indicators on NASDAQ stocks. Then we started to analyze the impact of the feedback on the past prices of NASDAQ stocks and its impact on the current price. Using PCA and Linear Regression algorithm, we have designed an optimal dynamic learning experience for modeling these stocks. The results obtained from the quantitative analysis are consistent with the results of the qualitative analysis of economic studies, and the modeling done with the optimal dynamic experience of machine learning justifies the current price of NASDAQ shares.
translated by 谷歌翻译
Recent advances in language modeling have enabled new conversational systems. In particular, it is often desirable for people to make choices among specified options when using such systems. We address the problem of reference resolution, when people use natural expressions to choose between real world entities. For example, given the choice `Should we make a Simnel cake or a Pandan cake?' a natural response from a non-expert may be indirect: `let's make the green one'. Reference resolution has been little studied with natural expressions, thus robustly understanding such language has large potential for improving naturalness in dialog, recommendation, and search systems. We create AltEntities (Alternative Entities), a new public dataset of entity pairs and utterances, and develop models for the disambiguation problem. Consisting of 42K indirect referring expressions across three domains, it enables for the first time the study of how large language models can be adapted to this task. We find they achieve 82%-87% accuracy in realistic settings, which while reasonable also invites further advances.
translated by 谷歌翻译